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A  SIMPLE  UNCONSTRAINED  DUAL  CONVEX  PROGRAMMING  METHOO  FOR 
THE  COMPUTATION  OF  DISCRETE  MAXIMUM  ENTROPY  DISTRIBUTIONS 


P.  Brockett  and  K.  Paick 


ABSTRACT  > 

fj-  r  o Lu  A..^r 

— Wff  formula  teethe  generalized  constrained  maximum  entropy  problem 
often  used  in  a  decision  making  context  as  an  extended  dual  convex  pro- 

s - -  ^  * _ 

gramming  problem.  We  then  present  the  dual  problem^  In  this  dual 
setting  the  primal  Lagrange  multipliers  are  precisely  the  dual  var¬ 
iables,  and  are  easily  calculated  directly  by  virtue  of  the  simple 
structure  of  the  dual  problem.  An  example  involving  the  selection  of 

6U-thalS 

best  equipment  for  an  oil  spill  is  presented  as  an  illustration.  -W- 
contrast -orm  sol ut ion  with  those  given  by  previous  authors.  : _ _ 
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The  notion  of  information  is  strongly  connected  to  the  amount  of 
uncertainty.  In  many  problems  encountered  in  operations  research  practice 
it  is  useful  to  estimate  the  discrete  probability  distribution  for  a 
random  phenonmenon  under  uncertainty.  The  general  maximum  entropy 
principle  is  a  very  useful  method  for  incorporation  the  uncertainty 
of  some  situation  into  a  probability  distribution  where  one  is  trying 
to  make  the  most  out  of  some  limited  knowledge  and  resources. 

The  maximum  entropy  estimation,  a  special  case  of  minimum 
discrimination  information,  has  been  used  in  numerous  fields  (e.g. 
Brockett  et  aj_  [1984],  Thomas  [1979]- 

In  a  recent  paper,  Freund  and  Saxena  [1984]  gave  an  algorithm  to 
compute  maximum  entropy  probability  estimates.  In  this  paper  we  present 
a  much  more  general,  and  much  easier  computational  method  for  obtaining 
these  estimates.  Additionally,  our  method  easily  extends  to  the 
computation  of  minimum  discrimination  information  estimates  as  well. 

The  bulk  of  this  paper  centers  on  the  development  of  a  dual  convex 
programming  formulation  for  maximum  entropy  estimation,  and  shows  how 
to  view  maximum  entropy  estimation  from  this  dual  convex  programming 
point  of  view.  We  then  point  out  the  analytical  properties  of  the 
estimates  which  follow  directly  from  the  form  of  this  duality.  Section 

I  contains  the  mathematical  formulation  of  maximum  entropy.  In  section 

II  we  present  the  unconstrained  dual  formulation  for  Lagrange  multipliers 
due  to  Charnes  and  Cooper  [1975]  an d  Charnes,  Cooper  and  Seiford  [1978]. 
Section  III  contains  an  application  of  the  unconstrained  dual  convex 
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programming  method  to  an  oil  spill  equipment  selection  problem  considered 
in  Thomas  [1979],  and  also  in  Freund  and  Saxena  [1984].  The  algorithm 
we  present  is  more  general  and  easier  computationally  than  that  given 
in  Freund  and  Saxena  [1984]. 

I.  MAXIMUM  ENTROPY 

Mathematically  the  problem  of  maximum  entropy  estimation  is  to 
determine  that  density  function  p  which  is  maximally  uncertain,  and 
which  satisfies  certain  given  constraints,  e.g. 

max  H(p)»  -  /  p(x) ln[p(x) ]X(dx) 

s.t.  /hg (x)p (x)X (dx)*6g*1 

fh] (x)p{x)X(dx)«61 

/hk(x)p(x)X(dx)-9k 

Here  X  is  some  dominating  measure  for  p  (usually  Lebesgue  measure  in 
the  continuous  case,  or  counting  measure  in  the  discrete  case),  9^,..., 9^ 
are  the  given  constant  values  for  a  known  set  of  moment  functions  h^,..., 
h^  and  hg(x)«1.  Frequently  in  hypothesis  testing  or  estimation  of  an 
inferential  distribution  by  maximum  entropy  estimation,  one  has  information 
about  the  possible  candidate  distribution  in  the  form  of  inequality 
constraints  in  addition  to  equality  constraints.  The  form  of  range 
constraints  of  probabil.ity  distribution  can  be  easily  transformed  to 
inequality  constraints.  In  this  case  we  can  add  following  constraints 
to  the  original  constraints; 
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/hk+J  (x)p(x)Mdx)  ^  9k+1 

• 

/hk+n(x)p(x)\(dx)  <  ek+n. 

The  explicit  calculation  of  maximum  entropy  density  subject  to  the 
given  constraints  is  carried  out  by  Lagrange  multipliers.  Since 
solving  the  maximum  entropy  estimate  entails  solving  the  highly 
non-linear  constraint  equation,  it  has  been  difficult  to  solve  the 
Lagrange  multiplier  system  explicitly  in  order  to  obtain  a  closed  form 
solution  expressed  directly  in  terms  of  the  known  expected  values  9. 
(Leblanc  and  Reisher  [1981],  Brockett  ej^  £l_  [1980]).  For  this  reason 
certain  numerical  solutions  were  derived  by  approximation  (first  order 
approximation  by  Guiasu  [ 1 980 ] ,  second  order  approximation  by  Leblanc 
and  Riesher  [1 981]).  Furthermore ,  the  solution's  in  Thomas  [19 80]  and 
Freund  and  Saxena  [1984]  turn  out  to  be  different  from  the  optimal  so¬ 
lutions.  We  will  discuss  these  examples  in  section  III. 

II.  UNCONSTRAINED  DUAL  PROGRAMMING  APPROACH  TO  ESTIMATION. 

In  the  first  part  of  this  section,  we  shall  present  maximum 
entropy  estimation  in  the  discrete  case  via  dual  convex  programming 
with  only  non-positivity  constraints.  These  results  are  special  cases 
of  the  results  given  in  Charnes  and  Cooper[l975]  and  Charnes,  Cooper, 
and  Sei ford [1 978] . 

Theorem  2. 1 

The  following  linear  constrained  maximum  entropy  primal  problem 
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sup  v(<5)  *  -6cln6 
s. t.  6CA^  ■  C 
5CA^  1  b^t 


has  a  dual  problem 

•  e  n  \  /.!  I  .2  2,  ,1t  1  .  2t  2 

inf  £(z)-  exp(A  z  + A  z  )-b  z  -b  z 

s.t.  ^<0-  tD> 

There  are  three  mutually  exclusive  and  collectively  exhaustive  duality 

states; 

(1)  A={5:  <5CA^  »b^  C  ,  6tA^<b^t  ,6^0}  *  $  and  £(z)  is  unbounded  below. 

(2)  Every  feasible  solution  of  (P)  has  a  zero  component  for  all  5eA?*<t> 

2 

and  ^ (z)  with  non-positive  z  has  only  an  infimum.  In  this  case  inf£(z)« 
m axv(<5)«  min  £Q(z)  where  £q(z)  contains  only  those  terms  of  £{z)  for  which 
6.>0  in  some  6eA. 

* 

(3)  There  exists  cSeA  with  6>0  and  £(z)  has  a  minimum  at  z  .  In  this  case 
following  relationships  obtain  between  the  optimal  primal  and  dual 
variables 

a)  inf£(z*)“  supv(6*)«  maxv(6*)  ■  min  £(z*) 

b)  v(<5)  has  a  unique  maximum  at  5*>0 

.  *r  ,  i  1  *  2  2*i 

c)  <5  ■  exp [A  z  +  A  z  ] . 

Note  that  state  (3)  is  the  usual  state  considered  in  applied  problems. 

Proof  (adapted  from  Charnes,  Cooper  and  Seiford  [1978]) 


The  constraints  in  primal  problem  may  be  written  as 


(2.1) 


<StA2  +  Yt  -  b2t 

5,  y  >_  0 

1  2 

Here  A  is  n^xr^  and  A  is  m2xn2.  By  the  duality  inequality, 

-  Z  5.1n6.<_  Z  (exp  (x.  )-5  .x. )  -  Z  y.y. 
i  '  1  ieA  1  11  ieB  1  ' 

with  (2.1),  y .£0,  ieB.  Also  A-{1 . m^},  and  B«{m^+1 ,. . . ,m2).  i.e. 

-6Cln6  min  «xp(x)-(6tx  +yCy)=  K(d,y,  x,y)  with  <5,y  _>  0,  y<0. 

t  1  1  A  .  A 

To  decouple,  we  obtain  6x  +  yy«bz  +  t>  *z  and 
(<5t  ,yt)f  x"l  -  (  5t,yt)  fA1  A2“l  Tz  1-\ 


0  I 


][:.] 


where  we  have  set 


C]f  %) 


thus  we  obtain  the  dual  problem 

•  r  \  -  /.I  1  .2  2\  .It  1  ,  2t  2 

tnf  ^(z)  =  exp (A  z  +  A  z  )  -  b  z  -b  z 

2 

s.t.  z  ^  0  . 

Because  state  (3)  is  the  most  usual  and  encountered  state,  we  shall 
present  the  proof  for  state  (3)  only.  The  proof  of  (1)  and  (2)  may  be 
found  in  Charnes,  Cooper  and  Seiford  [1978]. 

Let  K(<5,y,x,y)«  exp(x)  -  5Cx  -  yCy  for  5,y>0,  xeR  and  define 
g(6)«  inf  K(6,y,  x,y)«  -6 C 1 n6 .  Because  of  the  constraint 

(’6t,yt)rA1  A2"!-  (|,1t*  b2t)  ar*d  by  setting  x  •  A1  A2  z1 

L  °  I  J  y_  °  t  z2 
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we  have  g(6)  *  -6C  1  n ( 6 )  <_  exp{x)  -  5Cx  -  yty 

*  exp[  aV  +  A2z2]  -  6t[  A1  z1  +  A2Z2]-  y tz2 

*  expf  a’z1  +  A2z2]  -  bV  -  b2z2 

which  holds  for  all  z \z2<_  0  ,  6  *{6:  5tA1  -  bU,  5CA2  <  b2t,  6  _>  0}, 

^2  f  2t  5* 

and  y  =*{y  :  6  A  +  y  »  b  ,  y>  0}.  The  equality  holds  for  6  *  exp 


[a'2'4*A2z24]. 


Q.E.D. 


If  the  requisites  for  state  determination  are  not  obvious,  the 
state  may  be  charaterized  by  means  of  the  following  linear  programming 
problem: 

max  u 

s.t.  uwC  -  5C  <_  0 

5tA1  =  b1c 

«‘a2  <  t>2t  • 


5  ^  0 

where  wC  ■  (1,1,...,1).  State  (1)  corresponds  to  infeasibility,  state 

(2)  corresponds  to  u  ■  0  and  state  (3)  corresponds  to  u  >  0.  It  is 

obvious  that  there  is  no  linear  independence  requirement,  and  all 

possible  behaviors  for  the  system  A  are  considered. 

This  result  is  very  attractive  since  the  dual  problem  (D)  is  an 

convex  programming  problem  involving  only  exponential  and  linear  terms 

2 

with  non-positivity  constraint  for  z  .  Moreover,  the  desired  Lagrangian 
multipliers  for  the  maximum  entropy  estimate  (P)  are  precisely  the  dual 
variables  to  (D) ,  and  (0)  is  easily  solved  numerically  because  of  the 
simply  constrained  nature  of  the  problem  (even  unconstrained  in  the 


equality  constrained  case  of  the  primal).  Any  of  a  number  of  readily 
available  non-! inear  programming  codes  can  be  used  to  solve  the  dual 
formulation.  Moreover,  since  we  explicitly  know  the  parametric  form 
of  the  optimizing  density  in  terms  of  the  unknown  Lagrange  multipliers, 
and  this  form  is  unique  and  continuous  in  the  unknown  paramenters,  the 
procedure  we  employ  in  obtaining  the  Lagrange  parameters  via  the  dual 
convex  programming  problem  and  then  substituting  into  the  parametric 
form  is  stable  numerically. 

An  alternative  structure  for  a  dual  problem  is  a  single  one 
parameter  sequence  of  equality  form.  In  order  to  make  K(6,y,  x,y)  more 
symmetric  in  6  and  y,  and  to  remove  the  restriction  of  y<0,  we  adopt 
the  same  procedure  we  employed  before.  We  can  change  (P)  into  ( P 1 )  w.l.o.g. 

( P 1 )  max  5Cln6  -  etlny 


s.  t . 


0tA1 


<5CA2  +  yC 


5,  y>0  where  e>0. 

Let  define  K(6,y,x,y)»  exp(x)  +  sexp(y/e)  -  5  x  -  y  y  with  e>0. 
By  the  C ha rnes- Cooper  duality  theorem  [1975].  the  following  inequality 
holds. 

-5tlni5  -  yClny  £  K(6,y,x,y). 

Because  of  the  given  constraint 

A1  A2!  ■  (b1C,  b2t)  ,  and  by  setting 

0  I  J 


(S\  YC) 
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we  have 


-6tln6  £  exp(x)  -  6*x  -  yCy  +  eexp(y/e). 

Thereby  a  new  unconstrained  form  of  dual  problem  (O')  for  (P1)  is 
obtained. 

(D1)  inf  5(z)  i  exp^z1  +  A2z2)  +  eexp(z2/e)  -  b^z1  -  b2tz2 
The  duality  theory  of  (P1)  and  (O')  is  exactly  that  of  the  equality 
case  presented  in  Brockett,  Charnes  and  Cooper  [ 1 980] .  Charnes,  Cooper 
and  Tyssedal  [1983]  proved  that  (P ' )  is  equivalent  to  (P)  when  £  approaches 
zero,  and  as  a  result  (O')  gives  the  solution  to  (D) . 

III.  NUMERICAL  EXAMPLES. 

Two  examples  of  maximum  entropy  estimates  are  presented.  These 
are  based  on  the  oil  spill  problem  in  Thomas  [1979].  The  following 
four  alternatives  exemplify  the  decision  problem  for  a  particular 
habor  area  (see  Thomas  [1979]  for  the  details) 
a^  contract  all  clean-up  activities 

a^:  procure  equipment  A  for  open  area  spills  and  contract  for 
pier side  clean-up 

»y  contract  for  open  area  spills  and  procure  equipment  set  B  for 
pierside  clean-up 

a^:  procure  equipment  set  C  for  all  spills. 

Once  the  maximum  entropy  distribution  is  derived,  the  expected  annual 
cost  of  each  alternative  j,  E(Aj),  for  both  problems  can  be  easily- 

k  U 

calculated  as  E(  A.  )  ■  EAC..p.  where  AC.,  is  the  annual  cost  of  j 
th  i  ,J  '  IJ 

alternative  for  i  state.  We  will  provide  the  problem  and  solution  for 
maximum  entropy  estimation  part  only. 

The  problem  presented  in  Thomas  [1979]  reduces  to  the  following: 
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max 


Ep. In  p. 

i  i 

<-*  «  * 

i-p.“  1 
i 

.25 

±P1 

£  .60 

p;  <P,  i —3  ,  1 2 

.25 

-P2 

1  .50 

pl+p2-  P3+P5+P6 

.10 

<  p. 

£  .40 

P5^P6 

0 

-  3 

-  pi 

1  -30 

,i»4,11  ,12 

CL 

vl 

o 

CL 

0 

±  pi 

£  .50 

i*5, . . .  ,9 

p9  p8 

0 

1  P1 0 

£  -20 

P1 2  —  P1 1 +  p6 

p7  -  P10  +  P1 1  +  p12 

P;  >.  0 

Freund  and  Saxena  simplified  the  above  problem  by  choosing  only  a  subset 
of  the  given  constraints,  namely  the  interval  constraints,  and  non- 
negativity  constraints  for  the  p.'s  and  of  course  the  normalizing 
constraint.  Instead  solving  the  simplified  problem  of  Freund  and 
Saxena  by  the  technical  algorithm  they  present,  we  note  that  we  can 
get  the  optimal  solution  by  intuition  in  this  case.  Due  to  the  maximum 
entropy  principle,  p  would  be  a  uniform  distribution  if  there  were  no 
constraints  other  than  the  usual  normalizing  and  non-negativity 
constraints.  p1 ,  p^,  and  must,  however,  have  the  value  of  their 
respective  lower  bounds  since  these  lower  bound  values  are  greater  than 
1/12  which  is  the  value  in  uniform  distribution  giving  maximum  entropy. 
Given  these  lower  bounds,  the  rest  of  the  probabi 1 i ties  p.,  i*4,...,12 
would  strive  to  uniformaly  allocate  the  residual  probability  l-Pj^'Pj. 
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p . 4/9 ( i »4 , . . . ,12) .  In  fact,  exactly  the  same  optimal  solution  is  obtained 
by  solving  the  dual  problem  according  to  the  technique  given  in  this  paper. 
The  maximum  entropy  value  is  2.1688.  Freund  and  Saxena  did  not  provide 
the  probability  distribution.  The  expected  annual  cost,  however,  in  their 
paper  implies  their  algorithm  did  not  find  the  optimal  solution.  Their 
expected  annual  costs  are  (18.65,  17-33,  15-80,  16-33).  These  are 
different  from  our  value  (18.761,  17-428,  15-888,  16.555)  which  were 
obtained  using  the  optimal  solution  to  the  dual  of  the  constrained 
maximum  entropy  problem. 

Using  the  duality  based  algorithm  presented  in  this  paper,  we  are 

able  to  go  even  further  than  Freund  and  Saxena  and  solve  the  original 

problem  in  Thomas.  We  obtain  the  optimal  solution 

P*-  -25  p*  -.25  P*  -.10  p£-.0472  p*-.0472  p£  -.0472 

p“-  .08183  pg-  -0472  p*-.0472  p*Q-  .0274  p^-  .0274  p^-,0274. 

This  has  a  maximum  entropy  value  of  H(p')-  2.14425549.  The  solution  in 

Thomas  is  different  from  our  results  even  though  it  has  only  a  slightly 

smaller  H(p  )  value.  The  resulting  "optimal"  probabilities  differ  quite 

a  bit  from  our  solution  in  some  cases  (  p  -  .044  instead  of  .0472, 

*  5 

p^-  .051  instead  .0472). 

In  the  dual  formulation  of  the  maximum  entropy  problem,  as  we 
proved,  the  computation  is  easily  accomplished  using  any  of  a  number 
of  existing  non-linear  programming  codes,  and  it  is  guaranteed  to  obtain 


the  optimal  solution  because  of  the  special  known  parametric  form  of  the 
primai  problem,  and  our  dual  programming  technique  for  obtaining  the 
parameters.  It  can  be  applied  in  both  (0)  and  (O').  We  find  that  the 
optimal  solution  in  (O')  is  very  close  to  the  optimal  solution  in  (D) 
as  expected,  and  so  we  might  use  (O')  instead  of  (0)  in  certain  cases 
if  we  prefer  unconstrained  optimization.  Also  we  can  extend  the  re¬ 
sult  of  the  C ha rnes- Cooper  duality  theory  to  continuous  maximum  entropy 
cases  in  the  same  manner  (Charnes  et  aj_  [1978]).  Additionally,  all 
of  the  duality  results  (and  consequent  computational  savings)  presented 
in  this  note  on  maximum  entropy  estimation  carry  over  directly  to  minimum 
discrimination  information  (MOI)  estimaiton  with  non-uniform  "goal 
densities".  See  Charnes,  Cooper  and  Seiford  [1978]  and  Brockett,  Charnes 
and  Cooper  [ 1 980 ]  for  dexails. 
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